01. Deep Reinforcement Learning

Deep Reinforcement Learning

INSTRUCTOR NOTE:

Note: \mathcal{R} is the set of all rewards. The reward probability is jointly specified with the transition probability as: p(s', r | s, a) = \mathbb{P}(S_{t+1}=s', R_{t+1}=r|S_t=s, A_t=a)